Shotgun Metagenomic Data Analysis ◾ 323
The “-i” option specifies the input contigs FASTA file, “-a” option specifies the depth
file that contains contig depth averages and variances, “-o” specifies the output path
and prefix, “-v” for verbose, and “--seed” specifies a seed integer to replicate the same
results.
Up to this point, we have performed the taxonomic binning successfully and now we
have separated genomes for each potential species in the metagenomic sample. However,
we do not know the qualities of these genomes and to which microbial species they belong.
So, the next step, we must evaluate these genomic sequences and assess their completeness
with regard to protein-coding genes and their annotations.
8.2.8 Bin Evaluation
The binning quality is usually assessed with CheckM [11], which includes a collection of
tools for assessing the quality of the genome sequence separated from metagenomes and
also to assess the quality of genomes recovered from single cells and isolates. CheckM pro-
vides estimates of genome completeness and contamination in addition to plots and other
important reports. For this software installation, visit “https://github.com/Ecogenomics/
CheckM/wiki”. On Linux, it requires HMMER, Prodigal, and Pplacer programs to be
installed and added to the system path.
sudo apt update
sudo apt install hmmer
sudo apt install prodigal
You need to follow the installation instructions at Pplacer home page, which is available
at “https://matsen.fhcrc.org/pplacer/”, and add it to the Linux path. Then, you can install
CheckM with the following commands:
pip3 install numpy
pip3 install matplotlib
pip3 install pysam
pip3 install checkm-genome
You can also install CheckM on Anaconda using:
conda install -c bioconda checkm-genome
conda install -c bioconda/label/cf201901 checkm-genome
Now, we can run CheckM commands to assess the completeness and contamination of
the genome bins by using lineage-specific marker sets. This workflow consists of several
steps that include placing bins in the reference genome tree, assessing phylogenetic mark-
ers found in each bin, and inferring lineage-specific marker sets for each bin. These steps
are done with multiple CheckM commands but they can also be done in a single step by
using “lineage_wf” command.